-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7862][SQL]Fix the deadlock in script transformation for stderr #6404
Conversation
val proc = builder.start() | ||
val inputStream = proc.getInputStream | ||
val outputStream = proc.getOutputStream | ||
val reader = new BufferedReader(new InputStreamReader(inputStream)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS your IDE settings are making spurious whitespace changes. It clutters the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed config the "Strip trailing spaces on save" to be "ALL". Thanks for pointing this out.
Test build #33507 has finished for PR 6404 at commit
|
Test build #33554 has finished for PR 6404 at commit
|
/cc @chenghao-intel |
@@ -58,6 +59,7 @@ case class ScriptTransformation( | |||
child.execute().mapPartitions { iter => | |||
val cmd = List("/bin/bash", "-c", script) | |||
val builder = new ProcessBuilder(cmd) | |||
builder.redirectError(Redirect.INHERIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builder.redirectError(Redirect.INHERIT)
would consuming the error output from buffer and then print it to stderr (inherit the target from the current Scala process).
If without this there would be 2 issues:
- The error msg generated by the script process would be hidden.
- If the error msg is too big to chock up the buffer, the input logic would be hang (There's are simple steps on jira to reproduce this behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the comment in the source code.
LGTM |
Test build #34742 has finished for PR 6404 at commit
|
Thanks, merging to master. |
After this patch, the pull request builder logs show 50000 lines of stdout output, which makes them hard to read:
At least I think that this is the patch that caused this issue. If that's the case, could someone open up a followup PR to fix this? |
Oh, sorry @JoshRosen the meaningless output is caused by #5671, I will fix it with a followup PR. |
} | ||
|
||
test("test script transform for stderr") { | ||
val data = (1 to 100000).map { i => (i, i, i) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will flush the stderr while testing, should we generate less data or hide it someway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Filed SPARK-8508 and PR #6925 to shrink unnecessary output.
[Related PR SPARK-7044] (apache#5671) Author: zhichao.li <zhichao.li@intel.com> Closes apache#6404 from zhichao-li/transform and squashes the following commits: 8418c97 [zhichao.li] add comments and remove useless failAfter logic d9677e1 [zhichao.li] redirect the error desitination to be the same as the current process
This is a follow up of #6404, the ScriptTransformation prints the error msg into stderr directly, probably be a disaster for application log. Author: Cheng Hao <hao.cheng@intel.com> Closes #6882 from chenghao-intel/verbose and squashes the following commits: bfedd77 [Cheng Hao] revert the write 76ff46b [Cheng Hao] update the CircularBuffer 692b19e [Cheng Hao] check the process exitValue for ScriptTransform 47e0970 [Cheng Hao] Use the RedirectThread instead 1de771d [Cheng Hao] naming the threads in ScriptTransformation 8536e81 [Cheng Hao] disable the error message redirection for stderr
Related PR SPARK-7044